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CONSISTENT NON-PARAMETRIC BAYESIAN ESTIMATION 
FOR A TIME-INHOMOGENEOUS BROWNIAN MOTION 

SHOTA GUGUSHVILI AND PETER SPREIJ 



Abstract. We establish posterior consistency for non-parametric Bayesian 
J^ ■ estimation of the dispersion coefficient of a time-inhomogeneous Brownian mo- 

tion. 



1. Introduction 



Consider a simple linear stochastic differential equation 
(1) dX t = a{t)dW t , X a = x, te[0,l], 



where W is a Brownian motion on some given probability space and the initial 
condition x and the square integrable dispersion coefficient a are deterministic. We 
interpret equation ([I} as a short-hand notation for the integral equation 



X t = x+ / a(s)dW s , t e [0,1] 
Jo 



> 

^r , where the integral is the Wiener integral of a with respect to the Brownian motion 

ly-v ' W. The process X is thus a time-inhomogeneous Brownian motion. The function 

\Q , a can be viewed as a signal transmitted through a noisy channel, where the noise 

(modelled by the Brownian motion) is multiplicative. Note that X is a Gaussian 
process with mean m(t) = x and covariance p(s,t) — J a 2 {u)du. By P CT we will 
C*~) ' denote the law of the solution X to (JT]) . 

Assume for simplicity that x = and denote £j )n = i/n, i = 0, . . . , n. Suppose 
that corresponding to the true dispersion coefficient a = ao, one has a sample 
Xt t ,i — 1, ■ • ■ ,n, from the process X at his disposal. Assuming that ao belongs 
to some non-parametric class X of dispersion coeffic ients, our goal is to e s timat e 



on- This problem for a similar mod el was treated in iGenon-Catalot et al.l ( 19921 ) , 



Hoffmann! (J1997I ) and lSoulierl (|1998l ) using a frequentist approach. However, a non- 
parametric Bayesian approach to estimation of cto is also possible. The likelihood 
corresponding to the observations X ti n is given by 

(2) L n (a) = Y[{ - =± J,' X ^- X ^ 



v 2vr fti n 1 , n (j2 ( u ) du \\//t' i -",„ o ' 2 ( u ) di 
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where ip(u) = exp(— u 2 /2). For a prior II on X, Bayes' formula yields the posterior 
measure 

/ E L n (a)n(da) 



n(s|x tl 



) -A-n.n) 



J x L n (a)U(da) 



of any measurable set E C X. In the Bayesian paradigm, the posterior encodes all 
the information required for inferential purposes. Once the posterior is available, 
one can proceed to computation of other quantities of interest in Bayesian statistics, 
such as Bayes point estimates, Bayes factors and so on. 

It has been recognised since long that Bayesian procedures s hould be theoreti- 

cally g rounded through establishing posterior consistency, see e.g. lDiaconis and Freedman 
(1986). In our context posterior consistency will mean that for every neighbourhood 



U ao of ctq (in a suitable topology) 



(3) 



n([£ 



\ X U),r, 



■■,x t ) 







as n — > cxi (the notation £„ — -> £ in ([3]) and below stands for convergence of a 
sequence of random variables £„ to a random variable £ in P CTo -probability). In 
other words, a consistent Bayesian procedure asymptotically puts posterior mass 
equal to one on every fixed neighbourhood of the true parameter. This is similar 
to the study of consistency of frequentist estimators. A method that does not 
appear to work in the idealised setting when an infinite amount of data is available 
(formalised by assuming that the sample size n — > oo) should also be unattractive in 
the finite sample setting. Hence the importance of a study of posterior consistency. 
The situation is typically quite subtle in the infinite-dimensional Bayesian setting: 
it is known that a careless choice of the prior might render a Bayes procedure 
inconsistent. For an introduction to consi stency issues in Ba yesian non-parametric 
statistics see e.g. lGhosal et al.l (J1999I ) and I Wassermanl (|l998| ). 

Our task in this work is to establish © under suitable assumptions on the class of 
dispersion coefficients a and the prior n. Asymptotic properties of Bayesian proce- 
dures in estimation problems for stochastic differential equation s have been already 
consid e red under various setups i n Gugushvi li and Spre ii (20121 . Ivan der Meulen et al 
1 2006)_ L van der Meulen and van Zantenl (J20131 ) , IPanzar and van Zantenl ( 2009T ) and 
Pokern et al.l ( 20131 ). primarily in the context of non-parametric Bayesian estima- 
tion of the drift coefficient of a stochastic differential equation. Computational ap- 
proaches to non- parametric Bayesian inferenc e for stochastic differential equation s 
were studied in Ivan der Meulen et al.l (|2013l ) and iPapaspiliopoulos et al. (l2012f) . 
Co nvenient overviews of the available results are given in iPavliotis et al.l (|2012i ) 
and Ivan Zantenl (2013J). However, in the above works dealing with Bayesian asymp- 
totics it is assumed that either a continuous record of observations is available on 
the solution to a stochastic differential equation, or that the solution is observed at 
equispaced time points A, 2A, . . . , nA, with asymptotics treated in the latter case 
under the assumption that A is independent of n and n — > oo. Our problem, on the 
other hand, requires a different approach due to a different sampling scheme and 
the fact that ergodicity of the solution to a stochastic differential equation, that 
played a prominent role in most of the previous works on non-parametric Bayesian 
approach to statistical inference for stochastic differential equations, is irrelevant in 
our case. Although the setup we consider looks simple, to the best of our knowledge 
our work is the first one to treat an inference problem for a stochastic differential 
equation in the so called high-frequency data case when A = A„ — > as n — > oo 
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using a non-parametric Bayesian approach. The high-frequency data setting is par- 
ticularly relevant in financial mathematics, where asset prices are often modelled 
through stochastic differential equations and where huge amounts of observations 
on them separated by very short time instances are available. Perhaps the most in- 
teresting feature of the present work is the method of proof of posterior consistency, 
which differs in certain respects from the currently used techniques. See Section |4] 
for a discussion. Also the simplicity of our model should not necessarily be consid- 
ered a disadvantage: indeed, the mo del is somewhat similar to the Gau ssian white 
noise model (see e.g. Chapter 7, §4 in Ibragim ov and Has'minskhi ( 19791 )1. which, as 



is known, has triggered some important developments in mathematical statistics. 

The paper is organised as follows: in the next section we formulate our main 
result dealing with posterior consistency for non-parametric estimation of the dis- 
persion coefficient. Since posterior consistency is closely linked with properties of 
a prior II, in Section [3] we provide an example of a reasonable prior satisfying the 
assumptions made in Section [2j Section [4] contains a brief discussion on the ob- 
tained result. The proof of our main theorem is deferred until Section [5l while the 
Appendix contains two technical lemmas used in Section [5j 

2. Results 

The non-parametric class of dispersion coefficients we will be looking at is given 
in the following definition. 

Definition 1. Let X be the collection of dispersion coefficients a : [0, 1] — > [k, K], 
such that a G X is Lipschitz with Lipschitz constant M. Here < k < K < oc and 
< M < oo are three constants independent of the particular a E X. 

Remark 1. Note that for a constant a we have P CT = P_ CT . A positivity assumption 
on a G X in Definition [T] can hence be viewed as a simple and natural identifiability 
requirement. Strict positivity assumption a > k > allows one to escape techni- 
cal complication s when manipula ting the likelihood @ (this condition has already 



appeared e.g. in iHoffmannl (|1997h V while the upper bound a(t) < K,t G [0, 1], re- 
stricts the size of the non-parametric class X and is reasonable in light of Definition 
[2] given below. Finally, Lipschitz continuity of a comes in handy at various stages 
of the proof of posterior consistency. □ 

The notion of posterior consistency depends on a topology on X. 

Definition 2. The topology T on X is the topology induced by the L^-norm || • || 2 . 

We now formalise the concept of posterior consistency. 

Definition 3. Let the prior II be defined on X . We say that posterior consistency 
holds, if for any fixed <7o G X and every neighbourhood U ao of cto in the topology T 
from Definition [H we have 

Tl(U c ao \X ton ...,X n , n )^0 
as n — S> oo. 

We summarise our assumptions. 

Assumption 1. Assume that 

(a) the model ([lj is given with x — and a G X, where X is defined in Definition 

m 
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(b) (To £ X denotes the true dispersion coefficient, 

(c) a discrete-time sample {Xt in \ from the solution to ([T]) corresponding to cto is 
available, where ti_ n — i/n, i = 0, . . . , n. 



Let 



where 



V m 



{a e X : ||cr-cr ||oo < e}, 



denotes the Loo-norm. The following is the main result of the paper. 



Theorem 1. Under Assumption]]^ posterior consistency as in Definition]^ holds, 
provided the prior IT on X satisfies 



(4) 

for any e > and at any a £ X '. 



n(v ffiE ) > o 



3. Example of a prior 

In this section we provide an example of a prior satisfying condition Q. Fix 
< re < K < co and take a fixed Lipschitz continuous function / : R — > [0, K — re] 
with Lipschitz constant N > and set a(t) = n+ L f(h(s))ds, where h : [0, 1] — $■ R 
ranges over the set of Holder continuous functions of order j3 S (0, 1/2) on [0, 1] for 
some fixed j3. Then each a maps the interval [0, 1] into the interval [re, K] and a is 
also Lipschitz with Lipschitz constant K, because 



\*(t) - a(s)\ 



f(h(u))du 



< K\t-s\ 



We take the collection of these functions a as the collection X from Assumption 
[1] (a). We will now construct a prior II on X. Let W = (Wt)o<t<i be a standard 
Brownian motion over the time interval [0, 1] and let Z be a standard normal 
random variable independent of W. Define the Brownian motion W — (W^)o<t<i 
initialised at Z by Wt = Z + Wt and introduce the process Y — (Y t )o<t<i, where 



Y t =K 



f(W s )ds. 



Our prior n on X will be the law of the process Y. 

We have to check that the prior n satisfies Q. To that end take a fixed <ro(t) = 
re + L f(ho(s))ds and let w be a generic realisation of the process W, so that 
y t = re + L f(w s )ds is the corresponding generic realisation of the process Y. We 



have 



because 



n<y CTo , £ ) = n(y : \\y - <t ||<x> < e) > II I w : \\w - fr ||oo < 



N 



sup 

*6[0,1] 



[f(Ws) - f(ho(s))]ds 



<||/(«J)-/(/»o)||oo<^||«;-Mc 



By Lemma 5.3 in Ivan der Vaart and van Zantenl (|20Q8b[ ). 



n (w : || 



w 



'Olloo 



< 



N 
> exp 



inf 



1 



g:||g-/i ||oo<e/(2JV) 2 



£ N 



oII.9I!h n ||m/|| 00 < — 



2NJ 
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Here H denotes the Reproducing Kernel H ilbert Space (RKHS) of the process W , 
g £ H, while || • \\h is the RKHS norm (see Ivan der Vaart and van Zantenl (|2008bl ) 
for a detailed treatment of these concepts with a view towards non-parametric 
Bayesian statistics). In our case H consists of absolutely continuous functions 
g : [0,1] — > R, such that \\g'\\2 < oo, while the RKHS norm i s given by \\g\\H = 
see p. 1446 in Ivan der Vaart and van Zantenl ( 2008al ). Note that 
: lis — ^olloo < £ /(2^V)} is not empty, as ho can be approximated 
-norm by t he convolution fe n * kh of hp wi t h a sm ooth 



y/9 2 (0) + \\9'\\l 
the set {g 6 H 

arbitrarily closely in the L c ^ 

kernel fc&(-) = (l/b)k(-/b), cf. p. 1446 in Ivan der Vaart and van Zantenl (|2008al) (we 

assume that ||fc'|| 2 < oo and b -)• 0). Furthermore, II(||W||oo < e/(2A0) > 0, 

because ||W||oo has a strictly positive density. Condition (|4|) easily follows. In case 

one is interested in a smoother class of dispersion coefficients a than what we have 

just constructed, one can simply take in the above construction of the prior n a 

smoother, say j3 times differentiable function /, and replace the Brownian motion 

W with a Riemann-Liouville process R — (Rt)o<t<i with Hurst parameter /3, 



R t =Y,Z k t k + I {t- S f- l ' 2 dW t 

k=0 



where Z^s are standard normal random variables, W is a standard Brownian mo- 
tion an d Zq, Z\, . . . , Zp, W are independent. See Section 4.2 in lvan der Vaart and van Zanten 
(|2008al ) for more information on the Riemann-Liouville processes. Arguments sim- 
ilar to the ones given above yield that in this case as well (|4]) is satisfied. 



4. Discussion 

In the present work we established posterior consistency for a statistical model 
obtained from a simple linear stochastic differential equation. General techniques 
for proving posterior consistency for a wide range of statistical models are by now 
well-developed. In the i.i.d. setting, broadly speaking, two main approaches ex ist in 
the litera t ure: a 'classical' approach as epitomised e.g. by iBarron et al.l f|l999h and 
Schwarta ( 19651) (we combine these two papers into one category, because they in 
some sense make use of assumptions of similar type, although their a ctual ass e rtions 
are different ) , and a martingale approach developed more recently in lWalkerl ( 20031 ) 
and IWalkerl (J2004l) . The first approach wa s extended to the setting of independent 
non-identically distributed observations in lChoudhuri et all (J2004l ). see in particu- 
lar Theorem A.l there. The s econd approach was exte nded to the case of discretely 
observed Markov pro cesses i n Ghosa l and Tana (|2006l ). A general theorem for pos- 
terior consistency in IChoudhuri et al.l ( 20041 ) makes two requirements: firstly, the 
prior must put sufficient mass in arbitrarily small neighbourhoods of the true pa- 
rameter (in an appropriate topology), and secondly, a sequence of sieves (increasing 
sequence of subsets of the parameter set) guaranteeing e xistence of certain expo- 
nentially consistent tests has to be exhibited; see p. 1056 in lChoudhuri et al.l (|2004l ) 
for additional details. Although in our setting the observations X tin ,i = 1, . .. ,n, 
are not independent, the increments X ti — X t . , are, and it appears conceivable 



that Theorem A.l in IChoudhuri et al.l (|2004[ ) could be used to establish posterior 
consistency in our model as well. However, we opted for a different approach, see 
the proof of our posterior consistency result, Theorem [1] A similarity shared by 
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Theorem A.l in IChoudhuri et al.l (|2004l ) and Theorem Q] is that both theorems re- 
quire that the prior puts sufficient mass in arbitrarily small neighbourhoods of the 
true parameter (in appropriate topologies). A difference is that due to the special 
structure of our model we do not need to make any reference to tests and sieves, 
but can establish posterior consistency by directly manipulating the posterior; see 
the proof of Theorem [1] for details. In this sense our approach to proving posterior 
consistency appears to b e more direct and more e lementary than the one that would 
employ Theorem A.l in lChoudhuri et al.l (I2004T) . Neith er do we make reference to 
entropy arguments as done e.g. in iBarron et al.l ( 1999T ). As far as the martingale 
approach to posterior consistency for ergodic Markov processes is concerned, we 
can be brief here: ergodicity is irrelevant in our setting and in fact our special sam- 
pling scheme seems to make generalisa tion or mod ification of the arguments from 
Ghosal and Tangj (J200d ). IWalken (J2003I ) and lWalkerl (J2004J ) impossible. 

Next a brief remark on condition (Q} on the prior II is in order. Although it 
is formulated in terms of neighbourhoods in the Loo-norm, the assertion returned 
by Theorem [T] employs the topology induced by the L2-norm. A 'discrepancy' 
between nor ms used is however n ot uncommon in posterior consistency results. See 
for instance IBarron et al.l ( 19991 ). 



5. Proofs 

Proof of Theorem QJ Let U ao be an arbitrary, but fixed neighbourhood of cto i n the 
topology T and let U a0:e = {a 6 X : \\<r — cr H2 < £}■ There exists e > 0, such that 

U« 

it thus suffices to show that 



C U a „, and hence U° C U% e . Fix such an s. In order to prove the theorem, 



(5) 



as n 



n(tCelA t[ 



•,**„.») 



00. Write 



nfl^Jx, 



.**„J 



(6) 



/gp L n (a)U(da) 

J x L n (a)U(da) 
f Uc R n (a)U(da) 

J x R n {a)U{d<i) ' 

where R n (c) = L n (a)/L n (ao) denotes the likelihood ratio. We will separately 
bound the numerator and denominator on the right-hand side of the last equality 
in ([6]) (we will use the notation D n for the denominator and N n for the numerator) 
and then combine the bounds to establish ©. As we will see, the left-hand side of 
(JSJ) in fact d ecays exponentially fast t o zero. Note th at when establishing posterior 
consistency. IBarron et al.l ( 19991) and IWalkerl ( 2004 ) also treat the numerator and 
denominator in the expression for the posterior separately, but similarity of our 
approach to the one in those papers largely ends here. 

Let S n (a) = n" 1 log R n (a). Then D n = J x exp(nS n {a))U(da). Now 



1 1 

i— 1 



ft!-l,n a 0( U ) du 



2n^ 



(X t . -x t . . f (X t . -x t . . ) 2 



4!i" „ a2 ^ du St!-!, n a o ( u ) du 
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= Ti, n (a) +T , 2 ,„(cr) 

with obvious definitions of Ti tU (a) and T2 in (<7). Let e > be a constant with its 
value to be chosen appropriately later on. Since D n > J v R n (a)TL(da) , Lemmas 
[T] and [2] from the Appendix and formula (|T2l give that with probability tending to 
one, 

f ( iK^ \ 

D n > / exp I —en I il(rfcr) 

( 4 ^~ A , 

= CXP I ~~^} £n J n (^o,?)- 

By assumption (j4]), n(T4- ,e) > 0. Fix a constant (3 — 5Ke/k 2 . Then for all n large 
enough, 

exp (~en) H(V ao ,~) > e^ n . 



As a consequence, with probability tending to one, 

(7) D n > e-? n 

as n — y oo. This is our required lower bound for the denominator D n . 

Using similar techniques, we will next treat the numerator N n . Firstly, note that 
by elementary arguments one can show that for an arbitrary fixed constant C > 
there exists another constant c > 0, such that the inequality 

log(l + y) <y-cy 2 , -Ky<C 

holds (one can take c = [2(C + l)] -1 )- Hence 

, £l" CTg(u)du\ f!^ [a 2 {u)-a 2 {u)]du 

log | — j— 1 < ^ 

Iti n 1 , n a2 ( u ) du J IuL", n v 2 (u)du 

InL",J a o(u) - a 2 (u)]du 

/*!_",„ °" 2 ( u ) d,J 

for some constant c independent of er G A 7 , i and n. Therefore, after a simple, but 
lengthy computation employing Assumption [T] (a) , cf. the proof of Lemma [TJ 

Ti,„(cr) < -- _ 

=1 



-'"^ J t ^ n in a 2 (u)du 



2?1 Z^[ J^" in a 2 (u)du 



, ^( ff g(„)-^(u)) 2 _ du+ /I 



2 Jo c 4 (u) V", 

where the remainder term is of order n _1 uniformly in a £ X. Hence 
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(9) +T ^ )+ 2j l>(u) dU 

(10) +0 (I 

To bound from above the term on the right-hand side of inequality ((SJ , use the fact 
that 



2J a 4 {u) ~ K 4 

for a G [/£ e . Furthermore, by Lemma [21 uniformly in a G U£ e and with proba- 
bility tending to one as n — > oo, the term ([9]) is smaller than any positive number 



fixed beforehand. So is the term ([TO]) . Therefore, uniformly in a G U£ e and with 
probability tending to one as n — > oo, 

2 

S n (a) < ~^e\ 
say. Thus with probability tending to one as n — > oo, 

(11) N n = / exp(n5„(a))n(da) < exp (-±le 2 n\ . 

This finishes bounding the numerator N n . 

We now combine bounds ([7]) and (ITT]) to conclude that with probability tending 
to one as n — > oo, 

II(C7 CT c 0]E |X t0i „ . . .,X tn J < exp (- ^e 2 - ^ 

Picking e small enough, so that 



k c 9 5i?„, 

implies ([5]) and completes the proof of the theorem. □ 

Appendix 

Lemma 1. Under the same assumptions as in Theorem\Jl for T\_ n {a) as in the 
proof of Theorem^ all a G V ao g simulatenously and for n large enough, Ti )n (cr) > 

-2Ks/k 2 . 



Proof. The elementary inequality 

y 



<log(l + y), y>-l 



y + l 
gives that 



l0g -7 ] > 



fti" 1 , n <j2 ( u ) du J ' ful", n a o( u ) du 

Next, employing Assumption [TJ (a) and (c), by a simple computation one can show 
that 

/*'/:;„ i°o( u ) - **(«)]<*« _ C7 2( t ._ ln ) _ ^{ti-i^ +0 (i 



St'" 1 °~o( u ) du a ° (**-l. n 



- " ■ ■ i n 
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where the remainder term is of order n 

1 1 A o$(t 



Ti, 



'(^2^ 



1 uniformly in cr G X. Therefore, 

) - (7 2 (tj_l,„) 



Co(*«-i,«) 



oil 

n 



By another simple computation, 

1 -A <7o(*i-l,n.) -cr 2 (^ 



n) _ Z" 1 CTo( M ) - fj2 («) 



i=l 



0S(*i_l 



du 



71 



where the remainder term is of order n 
have under Assumption [T] (a) that 

/o 0o(«) 

This implies the statement of the lemma. 

Lemma 2. Denote 



o cro(V) 

uniformly in cr G X. For cr G V^- ,e we 



(12) 



du 



< 



2K, 



-i. 



D 



.w 



T 2.ri(p) 



oo(«) 



-du 



/0 cr 2 (u) 

where T2,„(cr) is defined in the proof of Theorem\j\ Then under the same assump- 

izons as m Theorem [7J sup^g^ Q n (o~) — 4> as n — > oo. Furthermore, for any fixed 
s > 0, /or a/Z cr G Vcr 0i £ simultaneously, with probability tending to one as n — > oo, 
T 2 ,„(cr) > -2Xr/^ 2 . ' 

Proof. The fir st statement of the lem ma will be derived from an application of The- 
orem 18.14 in Ivan der Vaard ( 19981) . In particular, viewing Q n as a process on X 
with bounded sample paths, we will show that it converges in distribution to a zero 
process on X. The first statement of the lemma will then be a consequence of equiv- 
alence of convergence in distribution and in probability for constant limits. Note 
that in order to circumvent possible (non)-measu rability issues, outer probability is 
employ ed in the formulation of Theorem 18.14 in Ivan der Vaard ([1998D (see Section 
18.2 in Ivan der Vaard ( 19981) for more information on outer probability). Since no 
such problems will arise in our setting, we can instead directly work under probabil- 
ity P CTo . Indeed, the summands in T2 jr i(cr) are of the form Fi !n (a)(X ti n — X ti _ ± n ) 2 , 
with Fi. n the obvious functional of a. Hence taking the supremum over a does not 

affect the measurability property of T2 jn (c0. 

In order to apply Theorem 18.14 from Ivan der Vaard ( 19981 ). we need to verify 
its conditions. In our setting they reduce to the following ones: firstly, marginal 
vectors of Q n must converge in distribution to zero vectors, i.e. 

(13) (g„(cr 1 ),...,Q„(a,))^>(0,...,0), £eN. 



Secondly, the tightness condition must be satisfied: for arbitrary constants r\ > 
and £ > 0, one must be able to find a partition of X into finitely many X±, . . . , Xg, 
such that 



(14) limsupP CTo ( sup sup \Q n {o\) - Qnfa)] > £ ) < V- 

yi<k<ea 1 ,a 2 eXk 



Denote 



Fi,n = o-{X tj , n ,j = !,...,£) 



10 
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and 



Xi,n{v) 



ll.„ v , 2 Jiil,[5o(»)-" a («)]*f 

"5n^ i,n_ i_1 -"^ ~^™ — .2/..U.. r f - 



Also let E CTo be the expectation operator with respect to measure P<t . Note that 



1 1 



J2^ (Xi,n(0-)|J"i_l,„) = — 9 — ^Z 



4-",» °oM du 4_i,„ ^M^ 

es 

J^'J^M -cr 2 (u)]du 



1 ^^(uj-o 2 ^), ./I 



-du + O 



where the remainder term is of order n _1 uniformly in er G <-f. Furthermore, As- 
sumption [T] (a) yields that 



E <T0 (XL(O-)I^-I,, 



3 [/tj'j.i.nt^oW-^C")]^ 

4n 2 



4-l,n°' 2 W dU 

where the order bound is uniform in a G X. It follows that 

n 
^E ff0 bfi tn (*)\?i-i,n) "> 0. 



o 



1 



Lemma 9 in lGenon-Catalot and Jacodl ([19931 ) then implies that Q n (cr) — % 0. This 
verifies (fT3|) . 

We will now check (|T1| . Fix £ and 77 in (fH| . By a lengthy, but simple compu- 
tation employing Assumption [1] (a) and the triangle inequality, 

K n K 3 

(15) \Qn(<Tl) ~ Qn(<T2)\ < — ||<7i - CT 2 || oo V {X tln - X ti _ x ,nf + — \\°1 - °2 || oo . 

i— 1 

By the Arzela-Ascoli theorem, under Assumption [I] (a) the family X is totally 
bounded for the supremum metric. By definition this means that for every Q > 
there exists a finite set X C X, such that for any a G X there is some a G X 
with || <T — cr||oo < C/2- This and the triangle inequality imply existence of a finite 
partition X\,...,Xi of X, such that 



(16) 



sup sup ||cti -CT2II00 < C- 

l<k<ltJl,(72£Xk 



Furthermore, by the definition of the quadratic variation of the process X, 



(17) 



£(*, 



X. 



ti-i,n) 



<T (u)du. 



Combination of (fT5)) - (fTTl) yields (jT^J) for £ small enough, and consequently the first 
statement of the lemma too. The second statement of the lemma is a consequence 
of the first one, the fact that a G V ao ^, an analogue of inequality (IT^I) . 

1 /• 1 ^(«)-a 2 («). 



a 2 {u) 



-du 



< -n-e, 



and of a simple rearrangement 



t r^ r r^ j. l f 1 ?°M 

J-2,n\C) — J-2,n{&) + 77 / 

2 Jo & 



a 2 (u)-a 2 (u) 



(u) 



du 



BAYESIAN ESTIMATION FOR A TIME-INHOMOGENEOUS BM 11 



1 r^u)-aHu)_ du _ 



2 J a*(u) 

This completes the proof of the lemma. □ 
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